277
to annotate an unknown protein on the second BLAST run by including the unknown
sequences in my search on the first run, i.e. doing a position-specific iterative matrix search
(which is what Psi is in Psi-BLAST). But if you want to do this well, you should make
many additional predictions, including looking at the structure, the localization, the
domains, and only then you get quite good results and insights into the function (see also
examples PROSITE and AnDom). The work of Gaudermann P, Vogl I, Zientz E et al
(2006) Analysis of and function predictions for previously conserved hypothetical or puta
tive proteins in Blochmannia floridanus. BMC Microbiol. 2006;6:1). If one wants to be
more precise, like the ENCODE consortium, and find all regulatory elements in a genome
(and not just the proteins or genes), then it is advisable to map out conserved regions via
closely related genomes and also to use active motif search programs such as motif-based
sequence analysis tools (MEME) (for this, read the paper https://www.sdsc.edu/~tbailey/
papers/meme.ml.pdf and refer to the web site https://meme-suite.org/doc/meme.html).
Very handy to identify repetitive elements (recurring units) is the general software
RepeatMasker (https://www.repeatmasker.org). We have also developed our own server,
L1base, which finds LINE elements, i.e. large, repetitive, selfish DNA sequences (https://
line1.bioapps.biozentrum.uni-wuerzburg.de/; here you are redirected to the Charité page,
https://l1base.charite.de, which shows the current further development of the server and a
documentation). Another possibility is to search for repeats in protein sequences, where
the tool REPRO (based on local alignment, Smith-Waterman, and subsequent iterative
clustering; https://www.ibi.vu.nl/programs/reprowww/) is very useful. Again, the docu
mentation on the website is recommended. Genome annotation then quickly becomes a
science in itself. For the human genome, relevant sites are already recommended in the
book chapter, but also mentioned here. The ENCODE entry page already mentioned also
19.1 Genomic Data: From Sequence to Structure and Function